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1. INTRODUCTION 

The peaks and troughs detection is a very important task in many real-world applications in different 
fields such as bioinformatics [1], mass spectrometry [2], image processing [3] and signal processing [4], [5]. 
Because trough detection is equivalent to peak detection when we implement the peak detection for the 
additive inverse of the data, the algorithms are usually designed for peak detection. In signal processing, peak 
detection is the key step. However, the challenge of the peak detection is usually caused by noise where the 
light peaks are detected and the strong peaks are ignored. 

Nowadays, with the explosion of smart devices and automatic intelligent control systems along with 
the ever-increasing volume of data, signal analysis is more difficult because of the increase in noise as well 
as computational requirements [6], [7]. Many solutions have been proposed for peak detection in signal 
processing in the literature ranging from the adaptive threshold method (ATM) [8], wavelet transform 
techniques [9], [10], hidden Markov models [11], k-means clustering [12], and entropy-based techniques [13] 
to the automatic chromatographic peak detection (ACPD) [14], the peak of Shannon energy envelope (PSEE) 
[15], mountaineer’s method [16], automatic multi-scale peak detection (AMPD) [17], [18] and ATPD [19]. 
But most of them are designed for a specific type of signals. Only a few algorithms have been developed for 
general signals. 

Some algorithms are designed only for specific signals. For instance, ACPD is designed for 
chromatographic signals [20]-[22], and the PSEE [23]-[25] are designed for electrocardiogram (ECG) 
signals, while the ATM and others [16], [26]-[28] are designed for photoplethysmorgraphy (PPG) signals. 
Some other algorithms (but fewer) are designed for peak detection in general such as AMPD and ATPD [13], 
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[29], [30]. In which, the AMPD has more advantages because it operates without any parameters and quite 
robust regardless high and low frequency noise and detects peaks accurately in quasi-periodic signals. 

Besides the accuracy, the computational complexity of peak detection algorithms is also main 
concern for researchers, especially in big data analysis for real-world problems today. Therefore, our aim is 
to modify the original AMPD, an algorithm that is suitable for many type of signals without any parameters, 
to improve its runtime performance and the memory storage requirements. The modified version is then 
compared to the original AMPD in both peak detection accuracy and runtime performance. Finally, some 
suggestions for improving peak detection algorithms are recommended. 


2. METHOD 
In this section, we take an overview of the original AMPD and point out some necessary 
optimizations for it. Finally, some improvements are proposed. 


2.1. Overview of the original AMPD 

Given X = (%1,X2,...,Xy) is a sequence of observations from a univariate signal which the periodic 
or quasi-periodic peaks are involved. At the first stage, the AMPD calculates the local maxima scalogram 
(LMS) L x N matrix where L = [N/2] — 1 (notice that [z] is the smallest integer that is not less than z) and 
N is the length of X. For each k in {1,2,...,L}, the k -th rows of LMS allocates all the local maxima 
positions of X at scale k (the level of zoom for peak finding) at time t by assigning its element, m[k, t], equal 
to zero if X; > xX;_, and x, > X:4, for each t =k + 1 to N — k. Otherwise, m[k,t] = æ +r (where r is a 
random number from [0, 1]-uniformly distribution and «æ € R is a constant). In [17], æ is fixed by 1. The k - 
th row of the MLS is calculated as in Figure 1 fort =k +1 to N — k. 
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Figure 1. LMS calculation of the AMPD 


The first row of LMS has the highest resolution of the potential peak points and it reduces to the 
“low resolution” for each a half of the length of the signal. The second step is cutting the LMS by the r -th 
row to get LMSr matrix containing the r first rows of LMS, where r is the scale with greatest number of 
potential peaks (i.e. the greatest number of zeros). At the final stage of AMPD, the column-wise standard 
deviation of each column of LMSr is calculated. The position of the column where the standard deviation is 
equal to zero identify the maxima locations ofX. The Figure 2 (Figure 1 in [17]) shows the steps of the 
AMPD algorithm and Figure 3 (Figure 2 in [17]) visualizes the AMPD process by an example. 
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Figure 2. Diagram of AMPD algorithm (Figure 1. in [17]) 


2.2. Some necessary optimizations for AMPD 
There are two main observations are necessary optimized for the AMPD. Firstly, the most 
computational expensive in AMPD comes from calculating the LMS matrix. It has an O(N?) upper complexity 
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bound of memory requirements. Moreover, the LMS using the scales up to L = [N/2] — 1 is unnecessary. In 
the example given in Figures 3(a) to (f) (see [17]) for more detail), we can see that the number of necessary 
scales in the LMS is at the dramatically low rate (i.e. only first 55 significant scales against 1500 in total). 
Secondly, the calculation of column-wise standard deviations to find the columns with zero value in the final 
stage is equivalent to finding the intersection of r first rows of the LMS matrix. Therefore, the calculation of 
uniform random number for fulling the void of the LMS is very computationally expensive. 
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Figure 3. An example of applying the AMPD to a simulated signal, (a) LMS, (b) rescaled LMS, (c) number 


of maxima for each scale, (d) standard deviation for each column of rescaled LMS, (e) true peaks, and 
(f) zoomed true peaks 


2.3. Some improvements of the T-AMPD 

In this section, we propose some solutions to overcome the obstacles of the original AMPD 
mentioned in section 2.2, the modified algorithm called T-AMPD. For L valued reducing, we can consider 
setting it with a fixed number. For example, if the length of X is greater than 5000 then set 
L = [5000/2] — 1, otherwise, L = [N/2] — 1 as used in the R package ampd in [31]. In other respects, if we 
call S, the search space at the scale k then S4 = {Xxx,...,Xy_x} (see Figure 1). Suppose that ng (No = 1) is the 
number of desirable peaks need to be detected in the signal and k is the last scale that MLS[k, ] containing all 
of these peaks then they are all of S4. Therefore, the highest frequency of quasi-periodic signal is equal to: 


N-2k 

foi (1) 
for no = 2 in general. 

Because k is the last scale that MLS[k,] containing all of the true peaks, there is not any peak in 
Sy \ S, and S; \ S4. It means that f > k. Now from (1) we have: 


N 
ko (2) 


An optimized algorithm for peak detection in noisy periodic and quasi-periodic signals (Luc Tri Tuyen) 


2028 O ISSN: 2302-9285 


In (2) implies that L should be equal to [N/(no + 1)] — 1. 

In general cases, the computational resource in this proposal with ng = 2 is dramatically cut down 
in comparison to the original algorithm. The suitable L scales with different nọ are illustrated as in Figure 4 
where an example of MLS calculation was given by [17]. When nọ = 1 then L = [N/2] —1 as in the 
original AMPD. But this is unusual case in most of real-world applications. Notice that if no is greater than 
the number of true peaks in the signal, high rate of false positive errors can happen. From the number of 
peaks is unknown and depend on the highest frequency of the signal, we can choose ny by small number (i.e. 
No = 2,3,...) OF No = [N/fmax]| if the information about the highest frequency of the signal, fmax, is known. 
This case is available in many facts (i.e., ECG, PPG signals). 


Local maxima scalogram (LMS), M 


1 1500 3000 
Time [s] 


Figure 4. The suitable L with different nọ (an example of [17]) 


For optimization of the LMS calculation, for each k in 1 toL, the original AMPD (i.e. in [17] and 
another modified AMPD in [18]) finds the local maxima in the search space S, by taking each t in k + 1 to 
N —k and checking whether x, > x,_, and x; > X;-x41. If it is true, the value LMS[k,t] =0 and 
LMS{[k,t] = r + a otherwise. This way causes much waste of computer resources while it is equivalent to 
comparing the elements of S4 with S, and S} correspondingly (see the Figure 1 for more details). From S4, S2 
and S3 have the same length, instead of using two for-loops for finding the local maxima, we can compare 
the elements of S, with S, and S, with S simultaneously using the built-in operator for comparing two 
vectors which is available in many programming language such as R or MATLAB (i.e. and S; > S} directly). 
This reduces the runtime complexity from O(N?) of the original AMPD to O(N) of the T-AMPD. The 
positions of X which matched the condition are then stored in a vector, called p. list[k], with auto length (i.e. 
number of these positions). This length is surely less than N. Hence, we have a list of k vectors containing 
the potential peaks for each scale instead of a L X N matrix. 

Finally, let r be the first index of the vector among the vectors in the list where p. list[r] has the 
greatest length in the vector list, then r is the scale with the most potential peaks. The intersection of all these 
first r vectors give the vector, called p, including the indies of the true detected peaks. 


3. RESULTS AND DISCUSSION 

To demonstrate the efficiency of T-AMPD in comparison to the original AMPD, we run both 
algorithms on several signals and compare them in accuracy (i.e. false positive detection errors (FPDEs) and 
false negative detection errors (FNDEs)) and runtime performance. The algorithms are analysed in R on a 
2.60GHz Intel Core i7-4510U CPU with 8GB RAM running on Window 10 v.20H2. For the up-to-date of the 
algorithms, the original AMPD is implemented by the ampd function in the R package called ampd [31] 
which has been optimized by its authors. Furthermore, the splitting size for peak detection of AMPD in ampd 
package is restricted by 5000 for signals with the length greater than 5000 and adapted for at least 10 local 
maxima are included (see the reference manual of the package in [31] for more details). Hence, T-AMPD 
will be implemented with ng = 1 which is equivalent to the original AMPD, ny = 10 for the common 
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requirement of the AMPD and to the first one which meets at least one false positive detection error (the 
T-AMPD with different ng is denoted by T-AMPD(n)). The runtime for each case is 95% confident interval 
with 100 times of run-simulation. The first time series for comparison is the total monthly number of 
sunspots [17]. The time series includes 3177 values (< 5000) ranging from 1/1947 to 9/2013 is enclosed in 
the R package called datasets. The results of comparison are given in Table 1. 


Table 1. Comparion of T-AMPD and AMPD for sunspot data 


: Accuracy Runtime (secs) 
AlEonihmn FPDEs FNDEs (95% Conf.) 
AMPD 0 0 0.6286 + 0.018 
T-AMPD(1) 0 0 0.4757 + 0.013 
T-AMPD(10) 0 0 0.1366 + 0.0027 
T-AMPD(65) 0 0 0.0251 + 0.0008 
T-AMPD(66) 1 0 0.0242 + 0.0007 


We can see that with only no = 1, the runtime of T-AMPD is much reduced as about 75% the 
runtime of AMPD and decreases as Ng increases. With no = 10, the runtime ratio of T-AMPD and AMPD is 
at a very impressive rate, The T-AMPD is 4 times faster than the AMPD. The first false positive detection 
error is caught when 7p is up to 66 where the runtime is very small with 25 times faster. Figure 5 shows all 
23 true peaks of the signal and the value of ng when the first FPDE is found. 
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Figure 5. T-AMPD for monthly sunspot data 


The second time series for comparison is the length of day (LOD) data. In this data, the duration of a 
day according to the SI units definition (1 day=86,400s) is compared with the astronomically determined 
duration of a day on earth. We can obtain the data from the International Earth Rotation and Reference Service 
(IERS) at https://hpiers.obspm.fr/iers/eop/eopc04/ with 1461 days (< 5000) from 2008 to 2011. The analysis 
shows that the results of T-AMPD with nọ from | to 207 have identical detection rates to AMPD where all of 
them have one false negative detection error. All the T-AMDPs with ny from 208 to 729 have no false detection 
occurred (see Figure 6). The false positive detection error only occurs when ng from 730 above. 
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Figure 6. T-AMPD for LOD data 
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The runtime performance of T-AMDP against AMDP is given in Table 2. The dramatically improve 
ment in runtime of T-AMPD is the same to the sunspot analysis case. We can see that with nọ = 10 (for 
general cases), the T-AMPD is more than 5 times faster than the original AMPD. From these peaks we are 
easily to estimate the average period of the longest days is 13.6 days which is adapted with the value reported 
in previous literature [32]. 


Table 2. Comparion of T-AMPD and AMPD for LOD data 


: Accuracy Runtime (secs) 
abou FPDEs FNDEs (95% Conf.) 

AMPD 0 1 0.1471 + 0.0057 
T-AMPD(1) 0 1 0.1011 + 0.004 
T-AMPD(10) 0 1 0.0273 + 0.001 
T-AMPD(207) 0 1 0.0028 + 0.00026 
T-AMPD(208) 0 0 0.0025 + 0.00035 
T-AMPD(729) 0 0 0.00153 + 0.00015 
T-AMPD(730) 7 0 0.00138 + 0.00014 


Finally, we analysis with a low-noise electrocardiography (ECG) time series with 200000 values (> 
5000) downloaded from https://plux.info/sensors/277 -electrocardiogram-ecg.html. Unfortunately, the original 
AMPD cannot run on our device in default setting because it cannot allocate the vector of too large size 
needed for the matrix LMS. Our T-AMPD still run well regardless the size of the scanning window though 
the runtime is a bit long. For fairly comparison, we set the splitting=TRUE for AMPD in order to limit the 
scanning window by 5000 which ensures the algorithm run normally, and our T-AMPD is also limited the 
scanning window in the same size. The results are given in Table 3. 


Table 3. Comparion of T-AMPD and AMPD for ECG data 


: Accuracy Runtime (secs) 
Apari FPDEs FNDEs (95% Conf.) 
AMPD 0 0 238.2067 + 0.9802 
T-AMPD(1) 0 0 144.036 + 0.7275 
T-AMPD(10) 0 0 27.1262 + 0.4094 


For the ECG signal with a large number of values as in this case, we can see in Table 3 that T-AMPD 
has approximately a duration of 144.036s (~2.4 mins) which is approximately a half of AMPD runtime with 
No = 1, and it is only approximately 27.12s with nọ = 10 which is more than 8 times faster than AMPD. 
Figure 7 shows the detected true peaks of the ECG signal run by T-AMPD at nọ = 10. We can see all 229 true 
peaks are detected from 200000 observations without any errors, exactly the same as the original AMPD. 
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Figure 7. T-AMPD for ECG data with all 229 true peaks are detected at nọ = 10 
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4. CONCLUSION 

In this paper, we have proposed some important adjustments for the AMPD algorithm in improving 
its runtime performance while maintaining the accuracy by introducing the desirable number of detected 
peaks parameter, called ng. The experimental results show that even with ng = 1, our algorithm (T-AMPD) 
greatly improves the computation time with at least 2 times faster in comparison to the AMPD’s runtime, and 
it gets even much better with larger but reasonable values of ngo. There is a way that the T-AMPD can be 
improved, that is we can fix “the best” ng for the algorithm. This “the best” ng as commented in section 2.3 
that it depends on the highest frequency fmax of the signal. In most practical areas, this fmay can be 
approximated by technical devices. 
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